Search CORE

52 research outputs found

Unsupervised Training for 3D Morphable Model Regression

Author: Cole Forrester
Freeman William T.
Genova Kyle
Maschinot Aaron
Sarna Aaron
Vlasic Daniel
Publication venue
Publication date: 15/06/2018
Field of study

We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objectives: a batch distribution loss that encourages the output distribution to match the distribution of the morphable model, a loopback loss that ensures the network can correctly reinterpret its own output, and a multi-view identity loss that compares the features of the predicted 3D face and the input photograph from multiple viewing angles. We train a regression network using these objectives, a set of unlabeled photographs, and the morphable model itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

Author: Funkhouser Thomas
Genova Kyle
Guibas Leonidas
Kundu Abhijit
Su Hao
Zhang Xiaoshuai
Publication venue
Publication date: 10/03/2023
Field of study

We address efficient and structure-aware 3D scene representation from images. Nerflets are our key contribution -- a set of local neural radiance fields that together represent a scene. Each nerflet maintains its own spatial position, orientation, and extent, within which it contributes to panoptic, density, and radiance reconstructions. By leveraging only photometric and inferred panoptic image supervision, we can directly and jointly optimize the parameters of a set of nerflets so as to form a decomposed representation of the scene, where each object instance is represented by a group of nerflets. During experiments with indoor and outdoor environments, we find that nerflets: (1) fit and approximate the scene more efficiently than traditional global NeRFs, (2) allow the extraction of panoptic and photometric renderings from arbitrary views, and (3) enable tasks rare for NeRFs, such as 3D panoptic segmentation and interactive editing.Comment: accepted by CVPR 202

arXiv.org e-Print Archive

OpenScene: 3D Scene Understanding with Open Vocabularies

Author: Funkhouser Thomas
Genova Kyle
Jiang Chiyu "Max"
Peng Songyou
Pollefeys Marc
Tagliasacchi Andrea
Publication venue
Publication date: 06/04/2023
Field of study

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.Comment: CVPR 2023. Project page: https://pengsongyou.github.io/openscen

arXiv.org e-Print Archive

Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

Author: Genova Kyle
Hata Kenji
Karakozis Ioannis Christos
Nair Prem
Qinami Klint
Russakovsky Olga
Wang Zeyu
Publication venue
Publication date: 01/01/2020
Field of study

Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has been little systematic comparison between these techniques. We design a simple but surprisingly effective visual recognition benchmark for studying bias mitigation. Using this benchmark, we provide a thorough analysis of a wide range of techniques. We highlight the shortcomings of popular adversarial training approaches for bias mitigation, propose a simple but similarly effective alternative to the inference-time Reducing Bias Amplification method of Zhao et al., and design a domain-independent training technique that outperforms all other methods. Finally, we validate our findings on the attribute classification task in the CelebA dataset, where attribute presence is known to be correlated with the gender of people in the image, and demonstrate that the proposed technique is effective at mitigating real-world gender bias.Comment: To appear in CVPR 202

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref